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Abstract: Peptide deformylases (PDF) behave as monomeric metal cation hydrolases for 
the removal of the /V-formyl group (Fo). This is an essential step in the /V-terminal Met 
excision (NME) that occurs in these proteins from eukaryotic mitochondria or chloroplasts. 
Although PDFs have been identified and their structure and function have been 
characterized in several herbaceous species, it remains as yet unexplored in poplar. Here, 
we report on the first identification of two genes {PtrPDFIA and PtrPDFIB) respectively 
encoding two putative PDF polypeptides in Populus trichocarpa by genome-wide 
investigation. One of them (XP_002300047.1) encoded by PtrPDFIB (XM_002300011.1) 
was truncated, and then revised into a complete sequence based on its ESTs support with 
high confidence. We document that the two PDF Is of Populus are evolutionarily divergent, 
likely as a result of independent duplicated events. Furthermore, in silico simulations 
demonstrated that PtrPDFIA and PtrPDFIB should act as similar PDF catalytic activities 
to their corresponding PDF orthologs in Arabidopsis. This result would be value of for 
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further assessment of their biological activities in poplar, and further experiments are now 
required to confirm them. 

Keywords: peptide deformylase; /V-terminal Met excision; in silico simulation; 
genome-wide investigation; phylogenetic analysis; gene duplication; ghromosome location; 
gene structure display 



1. Introduction 

In all organisms, the protein synthesis machinery requires newly synthesized peptides to start 
systematically with methionine (Met) [1]. Approximately two-thirds of mature proteins undergo 
/V-terminal Met excision (NME) by Met aminopeptidase (MAP; EC 3.4.11.18), which proteolytically 
removes the TV-terminal Met if the residue at position two has a side-chain with a radius of gyration of 

o 

1.29 A or less [2-5]. However, MAP cannot cleave Met with an TV-formyl group (Fo) from eubacteria, 
mitochondria and chloroplasts, where the TV-terminal Met moiety must be TV-formylated by a 
formyltransferase [5,6]. 

Removal of the Fo is undertaken by peptide deformylase (PDF), and is therefore an essential 
first step in allowing the subsequent NME occurrence in the eukaryotic mitochondria or 
chloroplasts [1,2,7]. Most PDFs are monomeric hydrolases and all contain three signature sequence 
motifs, comprising the active pocket of the enzyme and a metal cation: (i) Gy/Gy/AAXQ (motif 1); 
(ii) EGCLS (motif 2) and (iii) HEy/DH (motif 3), where yj is a hydrophobic amino acid [8,9]. The Cys 
of motif 2 and the two His residues of motif 3 stabilize metal ion coordination at the active site of 
PDF [8,9]. PDFs are important for some biological processes such as development of chloroplast in 
rice [10], and cell proliferation in humans [11]. 

Recent studies, together with the release of complete genome sequences for different organisms, 
have led to the identification of PDFs in eukaryotes; two PDFs have been identified in Arabidopsis [8,9], 
three in rice [10] and one in humans [11]. Since these PDFs do not contain the two insertions typical of 
PDF2 molecules, all eukaryotic PDFs are grouped as type 1 (PDF1). In Arabidopsis, the two PDFls 
form two subclasses: PDF1A that localizes to the mitochondria, and PDF1B that localizes to 
plastids [1,12]. These crystal structures of Arabidopsis PDF1A and PDF1B have been determined, not 
only indicating several similarities to bacterial PDFs and their function activity for the removal of the 
TV-formyl group, but also showing several clear differences between AtPDFIA (Atlgl5390.1) and 
AtPDFIB (At5gl4660.1) [8,9]. 

Although amount of research efforts have been employed in exploring PDFs structure and function 
for several plant species, such as Arabidopsis [1,8,9,12] and rice [10], such research has not yet been 
directed towards woody trees. In order to identify all genes encoding PDFs and explore their function 
in poplar, we initiate one genome-wide investigation combined with in silico simulations. In this work, 
we identified two genes encoding PDFs across the complete P. trichocarpa genome, and proposed that 
poplar PDFs should possess similar biological activities to their corresponding PDF orthologs in 
Arabidopsis. This result would be valuable towards further assessment of their functional roles 
in poplar. 
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2. Results and Discussion 

2.1. Identification and Characterization of PDF Genes in Populus 

To identify poplar PDF genes and their putative encoded polypeptides occurred in the complete 
P. trichocarpa genome, Hidden Markov Model (HMM) profile file of the PDF domain (PF01327) [13,14] 
was exploited as a query file for a search across the P. trichocarpa protein sequence data [15]. A total 
of two non-redundant putative genes were identified as PDF genes because of their encoding 
polypeptides significantly matched the known PDF domain (Table 1). Furthermore, to calibrate our 
identification of the two PDF genes from JGI poplar database, their encoding proteins were further 
compared by a BLASTP search against NCBI Reference sequence (RefSeq) database, which provides 
a non-redundant and validated collection of sequences representing genomic data, transcripts and 
proteins [16,17]. As a result, the two poplar PDF genes (640630 and 173925) respectively possess 
their individual counterparts of protein and mRNA in NCBI RefSeq database (Table 1), suggesting that 
they should represent correct proteins or genes. Thus, in this endeavor, two PDF1 genes (and their 
corresponding encoding PDF proteins) were identified in total across the P. trichocarpa genome by the 
genome-wide investigation. The P. trichocarpa genome encodes the similar numbers of PDF1 gene 
members as several herbaceous plants, such as Arabidopsis [12] and rice [1], indicating no expansion 
present in poplar PDF gene members. In contrast, the expansion was often present in large number of 
Populus multigene families [15]. The result might reflect the analogous need for PDF activities 
involved in Fo Removal between woody and herbaceous plants. 



Table 1. Characterization and identification of Peptide deformylases (PDF) genes of poplar. 



JGI NO. 


Novel simplified 


Refseq 


Refseq 


Chromosome Location 


nomenclature 


protein ID 


RNA ID (CDS) 


640630 


PtrPDFIA 


XP_002298 107.1 


XM_002298071.1 


LG_I: 9208431-9211542 (+) 


173925 


PtrPDFIB 


XP_002300047.1 


XM_002300011.1 


LG_I: 21839768-21844235 (-) 



2.2. Revision of Poplar PDF Gene-encoding Proteins 

To provide a simplified nomenclature for each identified protein, the two identified PDFs were 
respectively denominated as PtrPDFIB (XP_002300047.1) and PtrPDFIA (XP_002298107.1) 
according to their individual best hits with their orthologs in Arabidopsis (Figure 1 and Table 1). It is 
noteworthy that the coding sequence (CDS, XM_002300011.1) encoding PtrPDFIB might be 
uncompleted because of its absence of start codon "ATG" and stop codon, which leads to the truncated 
/Y-terminus and C-terminus of PtrPDFIB proteins. In order to amend it (XM_002300011.1) into 
complete CDS sequence, its corresponding Expressed Sequence Tags (ESTs) were retrieved by a 
BLASTN online search [18]. These 5' and 3' perfectly matched ESTs from NCBI were respectively 
applied for the alignment with 5' and 3' terminus of the CDS sequence (Figure 2a,b). The sequence 
alignment and further comparative analyses clearly demonstrated that upstream of the first three 
nucleotides "CTA" from the transcript (XM 0023 0001 1.1) should be extended by the "ATG" 
encoding Met as initiation codon as well as the followed 24 nucleotide acid sequences encoding one 
polypeptide with 8 consecutive amino acids (Figure 2a). Furthermore, downstream of the last three 
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nucleotides "AAA" from the transcript (XM 0023 0001 1.1) should be extended by the "TTA" 
encoding Leu as well as the following "TAA" encoding stop codon (Figure 2b). Although the CDS 
(XM_002300011.1) and protein sequence (XP_002300047.1) of PtrPDFIB were obtained from the 
NCBI Reference sequence (RefSeq) database, which provides a non-redundant and validated 
collection of sequences representing genomic data, transcripts and proteins [16,17], they will need to 
be refined since they could represent one truncated transcript or protein. In this endeavor, the truncated 
CDS/transcript of PtrPDFIB were confirmed by ESTs support with high confidence and revised into 
complete CDS sequence, whereas the corresponding full-length protein sequence of PtrPDFIB was 
also obtained, as shown in Figure 2a-c. 

Figure 1. Alignment of the PDF sequences between poplar and Arabidopsis. One complete 
amino acid sequence alignment of the two poplar PDFs with their orthologs in Arabidopsis 
was performed. It was found that they respectively shared the best amino acid sequence 
identities with AtPDFIA (AT1G15390) and AtPDFIB (AtPDFIB). Motifs 1, 2 and 3 are 
indicated as blue frames. White characters in grey boxes indicate strict identity, and black 
characters in white boxes indicate similarity, a, n and |3 represent a-helix, short 3io helix 
and P-sheets, respectively, (a) Sequence alignment of PtrPDFIA (XP_002298 107.1) with 
AtPDFIA of Arabidopsis; (b) Sequence alignment of PtrPDFIB (XP_002300047.1) with 
AtPDFIB of Arabidopsis. Gaps were introduced to insure maximum identity. 



(a) 



PtrPDFIA l mdfqpnsspsplkpplmsl:ll?Jv |v;hs 

AtPDFIA 1 y I HRD 

PtrPDFIA 71 PKPE " 

AtPDFIA 41 GSSH 



PtrPDFIA 14 0 
AtPDFIA 110 



KPTTLPTIF^TRMLV 70 
S jSaPVSRP 4 0 



PtrPDFIA 210 
AtPDFIA 180 



PtrPDFIA 280 
AtPDFIA 250 





299 
269 



(b) 

PtrPDFIB 1 

AtPDFIB 1 MAVCNCFLQAP 



EPIRLJ 
AGYGRlMSTVT 



PtrPDFIB 55 EJW£| [LE 
At PDF IE 69 gag [7C] 



Vfl2uiDa3FEgPLKIVEYPDPILRAKNKRID|FD«NLK|LVDgMFDVM 
FEUPLKIVEYPDPILRAKNKRIDnFDi2NLK"LVD2MFDVN 



PtrPDFIB 12 5 
AtPDFIB 139 



PtrPDFIB 195 



(S3 



NPAjEgGEGgEIVLVNP 
NPAeEgGEGSEIVLVNP 



m 



B4 



(!5 



JS2 



vgmnvqlmvf 
vgSnvqlmvf 



PI 1)2 p7 f>2 



VgFjEGCLSFPGIYAlVgRPSsVKIDARDlTGgRF 
V§Fg EGCLSFPGI YAa VQRPgSVKIDARDI^GiaRFS 




124 
138 



194 

208 
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Figure 2. Revision of PtrPDFIB transcript and its encoding protein from NCBI RefSeq 
database by multiple sequence alignment, (a) Multiple sequence alignment of 5' terminus 
between the original PtrPDFIB transcript, as well as its corresponding genome DNA and 
ESTs; (b) Multiple sequence alignment of 3' terminus between the original PtrPDFIB 
transcript and its corresponding ESTs; (c) Schematic diagram of the revised complete CDS 
of PtrPDFIB and its encoding full-length protein sequence. The amino acid encoded by 
each codon is displayed in the bottom of sequence alignment. Nucleotide acid sequences 
marked with open blue box represents the extended 5' or 3' terminus of PtrPDFIB 
transcript, while amino acid sequences marked with open blue box represents the extended 
N- or C- terminus of PtrPDFIB protein. 



(a) 

XM_002300011. 1(1-114) 

LGI:21844276-21844122(-) GGGTC 
BP928962. 1(1-155) TACA1 
BP928242. 1(1-150) 
AJ773441. 1(1-147) 



start codon 

Alf^GaCTTGCACTCGCAGCTCTTCGij 



XM_002300011. 1(1-114) 
LGI:21844276-21844122(-) 
BP928962. 1(1-155) 
BP928242. 1(1-150) 
AJ773441.K1-147) 



(b) 



ctatctcghgctctcctccttccttatctccatagccga 
gtaactcatatggcgacttgcactcgcagctctHcgctatctcgugctctcctccttccttatctccatagccga 
gtaactcatatggcgacttgcactcgcagctctmcgctatctcgrtgctctcctccttccttatctccatagccga 
gtaactcat at^gcgacttgcactcgcagctc^cgctatcgcgggctctcctccttccttatctccatagccga 
5^ggcgacttgcact(3gcagctctacgctatctcgsgctctcctccttccttatctcc£tagccga 



Met Ala Thr Cys Thr Arg Ser Ser Ser Leu Ser Arg Ala Leu Leu Leu Pro Tyr Leu His Ser Arg 



acacaaacsttatgcgagcccatcagaitcgcccggttcagttcaaccatgaagcaacccagacctccctcacta 
acacaaacSttatgcgagcccatcagaStcgcccggttcagttcaaccatgaagcaacccagacctccctcacta 
acacaaacSttatgcgagcccatcagaStcgcccggttcagttcaaccatgaagcaacccagacctccctcacta 
acacaaac|tfatgcgagcccatcaga|tcgcccggttcagttcaaccatgaaijcaacccagajctccctcacta 
a5acIaa(«ttatgcgagcccStcaga3tcgEccggttcag!tcaaccatgaagcaacccBgacctccBtcacta 



Thr Gin Thr Leu Cys Glu Pro lie Arg Leu Ala Arg Phe Ser Ser Thr Met Lys Gin Pro Arg Pro Pro Ser Leu 



XM_002300011. 1(709-774) 
BP935748. 1(385-308) 
BP936428. 1(431-354) 
DN499350. 1(37-114) 
BU833550. 1(12-89) 



stop codon 
-TTATAA 



(c) 



GGATTCCCAAGjjCCTGAAAGAATAGSAACACgCAGATTGAAGAAGGTTGCCGCAGGTTTTGGAAAAl 
GGATTCCCAAGnCCTGAAAGAATAGgAACAcBcAGATTGAAGAAGGTTGCCGCAGGTTTTGGAAAATTATAAATgTAG 
GGATTC£cAAGhCCTGAAAGAATAg8aACAc9cAGATTGAAGAAGGTTGCCGCAGGTTTTGGAAAATTATAAAtStAG 
GGATTCCCAAGSCCTGAAAGAATAGSAACACSCAGATTGAAGAAGGTTGCCGCAGGTTTTGGAAAATTATAAATSTAG 
GGATTCCCAAGSCCTGAAAGAATAGSAACACSCAGATTGAAGAAGGTTGCCGCAGGTTTTGGAAAATTATAAATSTAG 



Gly Phe Pro Ser Pro Glu Arg lie Gly Thr Arg Arg Leu Lys Lys Val Ala Ala Gly Phe Gly Lys Leu End 



atggcgacttgcactcgcagctcttcg:tatctcgtgctctcctccttccttatctccatagccgaacacaaaccttatgcgagcccatc 

MetAlaThrCysThrArqSerSerSer LeuSerArqAlaLeuLeuLeuProTyrLeuHisSerArqThrGlnThrLeuCysGluProIle 



agactcgcccggttcagttcaaccatgaagcaacccagacctccctcactattcgttcgcagtcaggccaagcgtggaccttttctcaaa 

ArgLeuAlaArgPheSerSerThrMetLysGlnProArgProProSerLeuPheValArgSerGlnAlaLysArgGlyProPheLeuLys 

CAAGACCAAGTAGCAGCAACTAGTGATCTTGAATTCGAGGCACCATTAAAAATCGTGGAATATCCAGACCCTATACTCAGAGCCAAAAAC 

GlnAspGlnValAlaAlaThrSerAspLeuGluPheGluAlaProLeuLysIleValGluTyrProAspProIleLeuArgAlaLysAsn 

AAGCGAATTGATAGTTTTGATGACAATTTAAAGAAATTAGTTGATGAAATGTTCGATGTCATGTACAAAACTGATGGTATTGGACTCTCC 

LysArglleAspSerPheAspAspAsnLeuLysLysLeuValAspGluMetPheAspValMetTyrLysThrAspGlylleGlyLeuSer 

GCCCCGCAAGTAGGAATTAATGTTCAACTTATGGTATTCAATCCAGCTGATGAGCATGGAGAAGGAGACGAAATTGTTCTTGTTAATCCC 

AlaProGlnValGlylleAsnValGlnLeuMetValPheAsnProAlaAspGluHisGlyGluGlyAspGluIleValLeuValAsnPro 

AGAGTGAATAAGTATTCGAAGAAAACTGTGCTCTTTAATGAAGGTTGCCTATCCTTTCCTGGGATATACGCCGATGTTAAGAGACCAGAA 

ArgValAsnLysTyrSerLysLysThrValLeuPheAsnGluGlyCysLeuSerPheProGlylleTyrAlaAspValLysArgProGlu 

TCTGTAAAGATTGATGCGAGGGACATTAATGGTGCAAGGTTTACTGTCAACTTGTCTGGTCTTCCTGCACGGGTTTTCCAGCATGAATTC 

SerValLysIleAspAlaArgAspIleAsnGlyAlaArgPheThrValAsnLeuSerGlyLeuProAlaArgValPheGlnHisGluPhe 



GACCATTTACAGGGGATTCTGTTCTTTGATAGAATGACTGAAGAAGTTCTTGATAGTATTCGTCCAGGGCTACAGGCCTTGGAAAAGAAG 

AspHisLeuGlnGlylleLeuPhePheAspArgMetThrGluGluValLeuAspSerlleArgProGlyLeuGlnAlaLeuGluLysLys 



TACGAGGATAAGACTGGATTCCCAAGTCCTGAAAGAATAGAAACACACAGATTGAAGAAGGTTGCCGCAGGTTTTGGAAAArTATAA 807 

TyrGluAspLysThrGlyPheProSerProGluArglleGluThrHisArgLeuLysLysValAlaAlaGlyPheGlyLys^euEnd 



2.3. Divergence in Poplar PDFls 

Divergence in PDFls that might give rise to be functionally distinct has found in herbaceous plants, 
such as Arabidopsis and rice. To examine whether similar PDFls divergence occurs in Populus, an 
unrooted tree was constructed by both Neighbor- Joining [19] and Minimum- Evolution methods using 
MEGA 5.0 [20] based on alignments of these full-length PDF proteins sequences (Figure 3a). The tree 
topologies generated by the two methods were comparable without modifications at branches, and 
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supported by their high bootstrap values of >60, suggesting that we constructed a reliable unrooted tree 
topology, in which two distinct clans occur, including PDF1 and PDF2 clans (Figure 3 a). Phylogenetic 
analysis demonstrates that PDF1 of Populus is encoded by evolutionarily divergent genes, which is 
consistent with previous reports in Arabidopsis and rice (PDF1A and PDF1B; Figure 3a) [2]. 
In addition, divergence occurred between PtrPDFIA and PtrPDFlB. This is supported by an apparent 
difference in their amino acid sequences, especially with one relatively shorter C-terminal sequence in 
PtrPDFlB. Our results indicated that divergence of PDF1 should be extended to Populus as a model 
woody plant, and the divergence might be caused by independent duplicated events. It is worth noting 
that another obvious divergence also exists in PDF1A (plant type PDF1A and animal type PDF1A) 
that the result supports previous phylogenetic analyses (Figure 3a) [2]. 



Figure 3. Phylogenetic analysis and gene structure display of the Populus PDF1 genes 
(a) Phylogenetic analysis of Populus PDF1 genes. Neighbor-joining bootstrap and 
Minimum Evolution values for clans supported above the 60% level were respectively 
indicated above and below the branches in red font. All PDF protein names and their 
individual corresponding ID number for phylogenetic analysis are listed as follows: 
SpnPDF2 (Q9F2F0); EcoPDFIB (P0A6K3); DmePDFlA-1 (Q8INL3); DmePDFlA-2 
(Q9VGY2); HsaPDFIA (Q9HBH1); OsaPDFIB (Q5VNN5); OsaPDFIA (B6RGY0); 
AthPDFIB (Q9FUZ2); AthPDFIA (Q9FV53); PtrPDFIA (XP_002298 107.1); PtrPDFlB 
(XP_002300047.1). The blue diamonds are highlighted in the front of all PtrPDFIA and 
PtrPDFlB from Populus. PtrPDFlB* represents the revised PtrPDFlB protein sequence in 
our study; (b) Schematic representation of the intron/exon structure for the Populus PDF1 
genes. Exons and introns of Populus PDF1 genes are represented by green boxes and black 
lines, respectively, and their sizes could be estimated by the scale at the bottom. 



(a) 



- ♦ PtrPDFIA 



■ OsaPDFIA 



■ AthPDFIA 

HsaPDFIA 

DmePDF1A-1 

DmePDF1A-2 J 



- EcoPDFIB 
•OsaPDFIB 
•♦ PtrPDFlB* 



■AthPDFIB J _ 



PtrPDFIA 



(b) 



PDF1 



PtrPDFlB 



0.1 



•SpnPDF2 ]PDF2 
5'l 



1000 



2000 



3000 



_|3' 
4000 



The gene structural display could provide us additional information for the evolutionary 
relationship of multi-gene families [21]. To further gain novel insight into the phylogenetic 
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relationship of poplar PDF1 genes, the exon/intron organization was illustrated for individual PDF1 
genes by comparison of the cDNA sequences and their corresponding genomic sequences (Figure 3b). 
As a result, the two evolutionarily divergent PDF1 genes members in poplar exhibited a different 
distribution of exon/intron structure such that PtrPDFIA and PrtPDFIB respectively possessed four 
and six exons in their individual coding regions (Figure 3b). The difference in exon/intron architecture 
of PtrPDFIA and PrtPDFIB might support the divergence in PDF1 genes of poplar from the 
phylogenetic analysis (Figure 3a). 

2.4. Chromosome Location and Duplication o/PDFl Genes in Populus 

In silico mapping of the gene loci showed that both the two PDF genes of PtrPDFIA and 
PtrPDFIB were found on Linkage Group I (LG I), one of the 19 LGs (Table 1 and Figure 4). Previous 
analysis of Populus genome has identified the presence of paralogous segments caused by the 
whole-genome duplication event in the Salicaceae (salicoid duplication), which occurred 65 million 
years ago and significantly contributed to the amplification of many multi-gene families [15]. To 
determine the possible relationship between the PDF1 genes and paralogous segments, the Populus 
PDF1 genes were mapped to the duplicated blocks of P. trichocarpa established in the studies of 
Tuskan and its coworkers [15]. The distribution of PDF1 genes relative to the duplicated blocks is 
illustrated in Figure 4. It was found that PtrPDFIB gene (50%), are represented within duplicated 
blocks, whereas PtrPDFIA are outside these duplicated blocks, suggesting that their occurrence should 
be caused by independent duplication events. The result is surprisingly consistent with the deduction 
from our phylogenetic analysis above. Furthermore, one duplicated pair {PtrPDFIB) harbored PDF1 
genes on only one of the blocks and lack corresponding duplicates, suggesting that dynamic changes 
on the loss event of its corresponding paralogous genes might have occurred following segmental 
duplication (Figure 4). The findings support the result that the most abundant genes losses in 
eukaryotes occur following the whole genome duplication [22] . 

2.5. In Silico Simulation on the Poplar PDFs Reveal Analogous Activities with Their Individual 
Counterparts in Arabidopsis 

The sequence alignment of PtrPDFIA and PtrPDFIB with known PDF sequences from Arabidopsis 
separately revealed high sequence similarity, especially the three conserved function-related regions, 
motif 1, motif 2 and motif 3 (Figure la,b). Consequently, PDF activity should be present in the two 
identified PtrPDFs in poplar. However, high sequence homology of the primary structure only partly 
provides evidence for their analogous catalytic activity. The in silico modeling of PtrPDFIA and 
PtrPDFIB were performed to explore the functions of these two proteins. As Figure 5 shows, 
PtrPDFIA consists mainly of helices, ^-sheets, turns and random coils (Figure 5c). It is identical to the 
structure of the known AtPDFIA (PDB code 1ZY1) protein [9], especially for the three conserved 
motifs (Figure 5a). However, there are differences in regions not directly related to the function. For 
example, the Af-terminal al -helix region of PtrPDFIA is split into two a-helices by a single turn 
whereas in AtPDFIA this is one continuous al -helix. A similar situation is also observed between 
PtrPDFIB and AtPDFIB (PDB code 3CPM) [8] (Figure 5b,d). 
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Figure 4. Chromosomal location of the Populus PDF1 genes. Both two genes are mapped 
to the Linkage Groups I (LG I), one of nineteen LGs. Segmental duplicated homologous 
regions in the LG I and LG XVII of Populus obtained from the research of Tuskan and its 
co-workers [15], are shown with the common colors. The duplication blocks containing 
PDF1 genes are connected with lines in shaded colors. Chromosome numbers (LG I and 
XVII) and sizes (Mb) are indicated at the bottom and end of each chromosome. Scale at the 
bottom represents a 10 Mb chromosomal distance. 




trPDFIA 



(0) 



(6.0 Mb) 

LGXVII 




PtrPDFIB 



Scale(Mb):J 1 ' ' 1 I 1 1 1 



(35.5 Mb) 

LGI 



Figure 5. Ribbons stereo views, (a) Ribbons stereo views of AtPDFIA; (b) Ribbons stereo 
views of AtPDFIA; (c) Ribbons stereo views of PtrPDFIA; (d) Ribbons stereo views of 
PtrPDFIB. The substrate Met-Ala-Ser is represented by a ball and stick model. Zn 2+ is in 
the purple CPK model. Ribbon colors: helices, [3-sheets, turns and random coils are in red, 
cyan, green and white, respectively. 

(b) 





(d) 
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As discussed above, the structures of PtrPDFIA and PtrPDFIB are similar to AtPDFIA and 
AtPDFIB, respectively. This conclusion is further supported by the analysis of the electrostatic 
potential surfaces (EPS). It is clear that the active sites of PtrPDFIA and PtrPDFIB are nearly the 
same as those of AtPDFIA and AtPDFIB, respectively (Figure 6). In addition, the binding sites of the 
substrate Met-Ala-Ser within AtPDFIA and PtrPDFIA are close in the structure (Figure 6a,c). The 
interaction energies (iWr) were calculated to be -208.75 and -122.21 kcal mol~ , respectively. During 
the ligand binding processes electrostatic effects play a large role, which amounts to 79% and 60% of 
the binding energies, respectively. For AtPDFIB and PtrPDFIB (Figure 6b,d), the energy values were 
-199.31 and -222.30 kcal mol , respectively. Electrostatic interactions (Eeie) rather than van der Waals 
interactions CEVdw) play a dominant role in the ligand binding processes, contributing to almost 79% 
and 85% of the binding energies, respectively. In particular, PtrPDFIA and PtrPDFIB recognize the 
tripeptide Met-Ala-Ser, which is consist with experiments and previous reports [8,9]. The results 
provide a hypothesis that the putative PDFs of poplar should act with PDF catalytic activity and in a 
similar mechanism to their corresponding PDF orthologs in Arabidopsis. This result is important for 
further studying and examining their biological activities. 

Figure 6. Surface electrostatic potential, (a) Surface electrostatic potential of AtPDFIA; (b) 
Surface electrostatic potential of AtPDFIB; (c) Surface electrostatic potential of PtrPDFIA; 
(d) Surface electrostatic potential of PtrPDFIB. The Connolly surfaces of the proteins 
were created using the Insight!! 2005 scripts. The electrostatic potential is indicated by the 
color saturation (red for negative and blue for positive). 



(a) 




(c) 



(d) 




3. Experimental Section 



3.1. Identification of PDF Genes across Poplar Genome 



The complete protein sequence database was downloaded from Populus trichocarpa vl.l [23]. 
Hidden Markov Model (HMM) profile file (Pep_deformylase.hmm) of the Pfam PDF domain 
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(PF01327) from the Pfam database [24], was exploited as a query file to identify PDF genes in the 
Populus protein database using the hmmer search command of the HMMER (v 3.0) program, which 
was widely applied for identification of homologues of an interested protein family [14,25]. 

3.2. Revision of Poplar PDF Proteins 

The expressed sequence tags (EST) were retrieved by BLASTN the corresponding transcript/CDS 
from P. trichocarpa vl.l [23] as query sequence online search against all of the Populus EST 
sequences in NCBI. Matches above 95% identity and over an alignment of at least 100 bp were 
considered as corresponding sequences of the PDF genes. Multiple sequences alignments of these 
sequences with their individual transcript/CDS sequence were performed using ClustalW program in 
BioEdit software under the default parameters settings [26]. Sequence alignments were manually 
adjusted to get maximum matching. 

3.3. Phylo genetic Analysis and Gene Structural Display 

The unrooted phylogenetic trees were constructed using MEGA 5.0 software [20], by both the 
Neighbor-joining method [19] and Maximum Likelihood method with parameters (p-distance and 
completed deletion) based on 1 1 aligned PDF sequences. The reliability of the phylogenetic tree was 
estimated using bootstrap value with 1000 replicates. Gene structure display server (GSDS) 
program [21] was applied to the illustrate exon/intron organization for individual PDF genes by 
comparison of the cDNA sequences and their corresponding genomic sequences. 

3.4. Chromosomal Location and in Silico Simulation 

The two identified PDF genes were located in the genome of P. trichocarpa using NCBI map 
viewer [27]. Identification of duplicated regions between chromosomes was completed as described in 
Tuskan et al. [15]. 

All the flexible docking simulations were performed with the different modules implemented under 
the Insightll 2005 software package [28] on Linux workstations, using the consistent-valence 
force-field (CVFF). The X-ray crystal structures AtPDFIA (PDB code 1ZY1) [9] and AtPDFIB (PDB 
code 3CPM) [8] were recovered from the RCSB Protein Data Bank and employed to construct the 
structures of PtrPDFIA and PtrPDFIB, applying the workspace in the Swiss Model [29,30]. The two 
protein models were optimized with the conjugated gradient algorithm (Discover 3.0 module). 
Geometry and partial atomic charges of the tripeptide Met-Ala-Ser were conducted throughout 
the Discover 3.0 module by applying the BFGS algorithm [31] with a convergence criterion of 
0.01 kcal-mol -A . As demonstrated by previous results [32,33], the docking simulations were 
performed to explore and understand the interactions of PtrPDFIA and PtrPDFIB with the tripeptide 
Met-Ala-Ser using the general protocols in the Insightll 2005 software packages [32,34]. The 
interaction energies of the substrate with proteins were calculated by the Docking module [34]. More 
details describing the calculation processes can be found elsewhere [32,33]. 
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4. Conclusions 

Removal of the Fo undertaken by PDF is an essential first step of the NME occuring in the 
eukaryotic mitochondria or chloroplasts. Some advances have been made in exploring structure and 
function of PDFs for several plant species, such as Arabidopsis, maize and rice. However, such effort 
has not yet been directed towards poplars as model woody trees. In this work, the above issues are 
addressed using the method of one genome-wide investigation combined with in silico simulations. 
P. trichocarpa genome contains two evolutionary divergent genes of PtrPDFIA and PtrPDFIB, 
which might be caused by independent duplicated events. Furthermore, PtrPDFIA and PtrPDFIB 
should act with similar PDF catalytic activity to their corresponding PDF orthologs in Arabidopsis. 
These results would be valuable resources for understanding the function of PDFs in poplar, and 
further experiments, based on our results, should be performed in the future. 
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